Re: [GENERAL] postgreSQL for storing a database of documents - Mailing list pgsql-general

From Ari Halberstadt
Subject Re: [GENERAL] postgreSQL for storing a database of documents
Date
Msg-id v04003a03b354ecef597e@[192.168.1.2]
Whole thread Raw
In response to postgreSQL for storing a database of documents  (raphael@cs.uky.edu (Raphael Finkel))
List pgsql-general
Raphael Finkel <raphael@cs.uky.edu> wrote:
>Is PostgreSQL appropriate for a database of documents?  I see several
>potential problems.
>
>1.  Tuples are limited to 8KB.  I assume even the "text" type is limited that
>way.  But my documents may be longer than that.

This would pretty much nix the idea of using postgres for a document
repository. You'd need to split the data into separate rows in another
table. I've done this for my bulletin board (though most posts should be
less than 8K in my case), but I think it would be even more of a hastle
with documents. What you can do instead is store descriptive information in
the DB and store the actual documents in the file system with a field in
the document table containing the path to the file. A few years ago I
worked on a commercial document management program that stored attributes
in an Oracle database but kept the actual documents on disk.

>2.  I need to be able to search by content.  I can use a regular-expression
>search in a SELECT ... WHERE clause, but I expect that to be exceedingly
>inefficient.  I would prefer a GLIMPSE-like search ability.

If it's stored outside the database then you can use any kind of text
searching tool. There's a full text add-on for postgres in
contrib/fulltextindex which I've been meaning to install. You could use
this to index all of the attributes so people can do full text searches on
things like comments and descriptions. When presenting the search results
you could merge the results of searching the files with the results of
searching in the text fields.

>3.  I will also be conducting searches by some other fields (like author
>or call number).  But only one field can be a primary key.  Will selection by
>other fields be terribly inefficient for large numbers (about 200K) of large
>tuples (ranging from about 100B to say 10KB)?

You can create as many indexes as you need using create index.

-- Ari Halberstadt mailto:ari@shore.net <http://www.magiccookie.com/>
PGP public key available at <http://www.magiccookie.com/pgpkey.txt>



pgsql-general by date:

Previous
From: Dustin Sallings
Date:
Subject: Re: [GENERAL] postgreSQL for storing a database of documents
Next
From: Boris Goldowsky
Date:
Subject: RE: [GENERAL] Thing that makes me go "Hmmm"